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Abstract 

This report introduces a parsimonious structure for mixture of au- 
toregressive models, where the weighting coefficients are determined 
through latent random variables as functions of all past observations. 
These variables follow a hidden Markov model. We modify EM and 
Baum- Welch algorithms to estimate the parameters of the model. 
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1 Hidden Markov Mixture Autoregressive Model 

Let Y = {Yt}^Q be a sequence of continuous random variables, where yt 
is a realization of If. Also let Tt = o~{Y s : s < t} represents the sigma- 
field of all information up to time t, F(yt\J-t-i) the conditional distribution 
function of Y t given past information and acjp = a^\yi, ...,yt-i). In addi- 
tion {Z t }t> P denotes a hidden or latent process which construct a positive 
recurrent Markov chain on a finite set E = {1, 2, K}, with the initial 
conditional probabilities 

P = (Pir • • ,PkY, Ph = p ( z p = h\yo,-" ,y P -i) h = l,...,K, (1) 
and transition probability matrix 



P = Ik,- 



i,j\\KxKi 
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in which 



TTij =P(Z t = j\Z t - 1 = i), i,j E{1,...,K}. (3) 

Also invariant probability measure is denoted by 

H = (ai,...,a K )', (4) 

where ay = lim^oo P(Z t = j). 

We consider {lj}^ *° nave a Hidden Markov-Mixture Autoregressive, 
HM-MAR(if,p), model with iT normal distributions, and p lagged observa- 
tions in the AR processes, if the conditional distribution of Yt given J~t-i 
follows 

i. For t = p 

F(y p , Z p = h\F p -i) = p h <$>{— ! '—^ ), (5) 

ii. For t > p + 1 

F(yt\Pt-i) 

where a? = P(Zt = h\Pt-i) an d 3>(.) is the standard normal distribution 
function. 

In fact latent random variables {^}^ p+1 determine the contribution 
of distributions in the mixture model. Also conditioning on Zt, {Yt,t £ 
N} is p-tuple Markov, independent of {Z s , s / t}. So by conditioning on 
{5t-ij ' ' ' j Yt-p} and Z t , Y t is independent of {Y s , s < t—p} and {Z s , s / t}. 

The novelty of HM-MAR model is that the contribution of each distri- 
bution in the mixture structure is not of predefined fixed form. Although 
HM-MAR model uses all past observations from Yq to Yt-\ but the hidden 
Markov assumption of the process {Zt}t> P , enables us to build a parsimo- 
nious model. 

The MAR model [3] can be considered as a special case of such a HM- 
MAR model ([5][6]), in which the transition matrix P of the process {Zt}t>p 
has K identical rows (i.e. p(Z t = i\Z t -i = j) = on for all i, j = 1,...,K, 
Thus {Z t }™ p+1 are independent and identically distributed) with p(Z t — 
i\Z t -i = j) = £*». 

HM-MAR model will also lead to hidden Markov model in general state 
space where p is considered to be zero in ([6]) (i.e. Y t given Zt, is independent 
of past observations) . 



E»i" 

h=i 



Ut - ao,h - ai,hUt-i 

Oh 



a p ,hyt- P x 



(6) 
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2 Estimation 



In this section, we discuss estimation of parameters of a HM-MAR(i^, p) 
model. A new algorithm is proposed based on modification of Baum- Welch 
PQ and EM [2] algorithms. Baum welch algorithm was originally proposed 
in the context of Hidden Markov Models for parameter estimation (For 
a comprehensive review see MacDonal and Zucchini [lj). In HMM each 
observation just depends on a state of a hidden variable, however in HM- 
MAR, past observations have also effect on next time series observation. 
First we justify that the modification of Baum- Welch algorithm is correct 
and then modify the EM algorithm for the case where the latent variable 
follows a Hidden Markov process. 

Let denote Aj = (agj, ■ ■ • , a p j)' then 6 = {Aj, aj, pj, 7r mn , m,n,j = 
1, • • • , K} constitutes the parameter set of HM-MAR model, which includes 
{K 2 + (p + 2)K} parameters. As Yt given Zt forms a p-tuple Markov in 
HM-MAR model, its conditional distribution can be written as 

F(y t \y ...yt-i,z t ) = f[ ~ Y '^ Ak )^= k \ (7) 

Oh 

k=i k 

where Y t = (l,yt, ■ ■ ■ ,yt- p +i)', also the conditional distribution P(zt\zt-\) 
is given by 

P(Z t = z t \Z t ^ = z t ^) = UTl^r k)I(Zt - 1=j) - (8) 

j k 

2.1 Extension of Baum- Welch Algorithm 

Lemma 2.1. Let {yt}J = o be a set of time series observations and {Zt} be 
a set of correct predictor indexes, in ARSNN next time series observations 
just depends on the last correct predictor. That is for t < k < T 

F(yt+i, ■ ■ ■ ,yx\yi, ■■■ ,yt, {z s } s( z NtS < t ) 

= F(y t+1 ,--- ,y K \yi,-'- ,yt,Z t ) (9) 

Proof. Considering the homogeneous hidden Markov structure assumption 
of {Zt} in HM-MAR model (JMSJ) and the assumption that yt given we have 
information about the Z t , just depends on p lagged time series observations 
through [71 we use the method of induction to prove ([9]) . So for k = t + 1 we 
have that 

F(yt+i\yi, ■ ■ ■ ,yt, {z s } se N,s<t) 
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K 

= ^2F(y t+1 ,Z t+1 = j\yx, - • • ,yt,{Zs}seN,s<t) 

3=1 
K 

= ^F(y t+ i\y l , - ■ ■ ,y t ,Z t+1 =j)P(Z t+1 = j\Z t ), 

3=1 

which is independent of {Zt-i, i € N, i > 1}. Now assume that equation ([9]) 
holds for t + 1 < I < T, that is 

F(yt+i, ■ ■ -,yi\yi, ■ ■ ■ ,yt,{Z s } s en, s <t) 

= F(yt+i,--- ,ye\yi,---,yt,Zt) (10) 

We show that Q is valid for k = £ + 1 

F(yt+i, ■ ■ ■ ,yi,yt+i\yi,-- • ,yt,{ z s}sen, s <t) = 

K 

F (ye+i\yi, ■■■ ,ye, Zi+i = j)P(Zg + i\yi, ■■■ ,ye, {Z s } s€NjS < t ) x 

3=1 

F{yt\yx, ■■■ ,yt, {Z s } s e®, s <t) 

K 

= ^2,F{yi+i\yw " ,yt,Zi + x = j)P(z e+1 \z t ) x 

3=1 

F(ye\yi,- --,yt, {Z s } s e®,s<t), 

which is independent of {Z t ~i\i>\ by the induction's assumption (fTUj) . □ 

Theorem 2.1. Let for t > p 

a t (h) = F(y p+1 ...y t , Z t = h\yi...y p ), (11) 
Pt(h) = F(y t+1 ...y T \ yi ...y t , Z t = h), (12) 

then at{h) and f3t(h) can be calculated by Baum-welch forward backward 
recursions as 

u t +i(h) = y^TT mh a t (m)^(^- Xl_>L) 

Pt(h) = V^iOX ^ 1 "^" 14 )- (13) 

3=1 J 

And the forward recursion starts with a p+ \{h) = ph&{(y p +i — Y' A^/a^} 
and backward recursion starts at ^r{h) = 1, in which 3>(.) is the standard 
normal distribution function. 
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Proof. at(h) in equation (fTT|) can be written as 

a t +i(h) = ^2F(y p+1 ...y t+ i,Z t = m,Z t+1 = h\y 1 ...y p ) 

m 

= ^p{Zt+i = h\Z t = m,y 1 ...y t ) x F(y t+1 \y 1 ...y t , Z t = m, Z t+1 = h) 



x F(y p+1 ...y t , Z t = m\yi...y p ) 

El \^i yt ~ ^t-l-A-h n 
m n 

Also by lemma \2A\ for f3t{h) in equation (I12p we have 
K 

Pt{h) = ^F(Z t+1 = j,y t+ x...y T \yi-yt,Z t = h) 
3=1 

K 

= /^2F(y t+2 ...yT\yi-yt,yt+i,Zt = h,Z t+ i = j) x 
3=1 

F(y t +i|yi...yt, 2* = /i, Z t+ i = j)p{ z t+i = j\vi-Vt, Z t = h) 

K 



(14) 



2^jA+ium z ) 



3=1 



<7\ 



(15) 
□ 



2.2 Modification of EM Algorithm 

The EM algorithm is used for maximization of completed data log- likelihood. 
By completed data we mean that the set of time series observations {yt}f = \ 
augmented with the latent set of correct predictor indicators {zt}f =p+1 (i.e. 
{{yt}f=i, { z t}f= p +i})- So this log-likelihood, by the method of iterative con- 
ditioning, can be represented as 

t(6) = logF(y p+1 ...y T ,z p+1 ...z T \y 1 ...y p ) 

T T 

= lo g(F(yt|yt-i,--- ,Vo,zt)) + ^ ^sC-POM^-i' " " " > z P,Vt-i 

t=p+l t=p+2 

log P(z p+1 \y i, - ■■ ,y p ) 



,yo)) + 



t=p+l k 
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T 

t=p+2 k j 

^2H z p+i = k)\ogp k , 

k 

where the last equality holds by ([7j) and the Markov property of {Z{\ with 
transition probabilities in (jHJ). It is clear that Ylt= P +2 I( z t = k)I(zt-i = j) 
is equal to the number of transitions from state j to state k. At the E-step, 
the algorithm computes the conditional expected value of each I(zt = k) 
and I(zt = k)I(zt-i = j) given the observed data. 

T 

E[£*(9)\yi,--- ,Vt}= 12^2 P ( Zt = k ^ Zt ~ 1 = j|yi-2/r)log7r 3 - iJfe + 

t=p+2 k j 

£ E p (^ = k\yi-y T ){-iog(V2^) -logK) - {yt ~^~ lAk)2 )} 

t=p+l k a h 

+ Y p ( z p +1 = k \y^-VT)^gp k . (16) 

k 

Last equation holds by linear property of expectation and since 3>{(yt — 
~Y' t _ l Ak)ai t } is measurable with respect to a{Yi, Yp}- Also E(l(z t = 
k)\y\...y T ) = P{z t = k\y\...y T ) and E(I(z t = k)I(z t -\ = j\yi-yr) = P(%t = 
k,z t -i = j\yi...yx)- These posterior probabilities can be obtained by the 
following lemma 

Lemma 2.2. P(Z t = h\Y u ■ ■ ■ ,Y T ) and P(Z t = j, Z t -i = i\Y x , ■ ■ ■ ,Y T ) in 
equation fl6\) can be calculated as 

P(Z t = h\ Yl ,...,Y T )- 



F(Y p+1 ,---,Y T \Y lr --,Y p ) ' 
P(Z t =j,Z t „ 1 = i\Y 1 ,---,Y T ] 



M)^ijCtt-i(i) 



F(Y p+1 ,---,Y T \Y 1 ,---,Y p ) 
yt-Y'^Ak 



°3 

in which F(Y p+1 , ■ ■ ■ , Y T \Yx, ■ ■ ■ , Y p ) = ^2f =1 a T (J) and {«*(•)> A (-)}f=p+i 
are calculated by theorem \2.1\ 

Proof. Using equations (fTTI) and (fT2j) we have 

P(Z-h\Y Yr) F(Zt = h,Y u ...,Y T ) 
P{Z t -h\Y u ...,Y T ) - jpr—yfi 
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F{Z t = h,Y p+1 ,---,Y t \Y 1 ,---,Y p ) x 
F{Y t+ir --,Y T \Z t = h,Y 1 ,---,Y t ) x 



F(Y U ---,Y P ) 



F(Y 1 ,---,Y T ) 

a t (h)f3 t (h) ^ 



F(Y p+1 ,---,Y T \Y 1 ,---,Y p ) ' 
in which 

K K 
F(Y p+1 , Y T \Y X , ...,Y p ) = J2 HYp+i, ■■■,Y T ,Z t = j| Y lf • • • , Y p ) = £ a T (j) 

and by lemma [2TTT ([7]) and Markov property of {Zt} we have that 

p(z t = i,z t _ 1 = i|y 1 ,---,y T ) = 

= F{yt+i, • • ■ ,yr\yi, ■ ■ ■ ,yt,z t ,z t -i)F(y t \y 1 , • • • ,yt-i,zt,zt-i) x 
^(ftjj/ij • • • , yt-i,z t -i)F(y p+1 , • • • , y t _i, ^-ijjft, • • • ,y p ) 
F(yi,--- ,Vt) 



F(Y p+l ,---,Y T \Y 1 ,---,Y p ) 

□ 

In the M-step, roots of equation dE[£*(0)\yi, • • • , yrj/dOi = 0, Qi G 0, are 
calculated 

Theorem 2.2. Lei Y = (Y p , • • • , Y T _i), Y = (y p+1 , • • • , y T )' and P fc = 
diag(P(Z p+ i = k\y\...yT), • • • , P(Zt = k\yi---yT))> then maximum likelihood 
estimate of the parameters HM-MAR are given by 

A k = (YPfcYO-^PfcY (18) 
-2 {Y'P fc (I - Y / (YP fc Y / )~ 1 )YYP A . - 2Y / P fc Y(YP fc YQ- 1 YP fc Y} n m 

ak = (19) 



Tl= P +2 p ( z t = h Zt-i = • -,vt) 



EL p +2 p (^t-i = j|yi.---,wr) 



(20) 



a - ^^rf'-'^ . (-) 

Proof, calculating dE[f{9)\y\, ■ ■ ■ ,yT\/d<t>k = 0> we obtain 

T 

^ P(«t = fc|yi...y T )Y i _ 1 (y i - Y^A*) = 

t=p+l 
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=► YP fc Y - YPfeY'Afc = 

A k = (YPfeYO-'YP^Y (22) 

calculating dE\i*{9)\y\, • • • , yr]/dak = 0, we obtain 
> P (z* = %i...y T )( + 



t=P+ i a i 

=> tr(P k )a 2 k = (Y - Y'A fc )'P fc (Y - Y'A*) 

= Y'P fc Y - 2Y'P jfc Y'A jfc + ^YPfcY'Afc (23) 

Since (Y'P k Y'4 fc )' = A' fc YP fe Y. Replacing A k from equation ((23J), we 
obtain equation (fTHj) for o^. 

Since for each j = 1, ■ • • , K in the transition matrix P of Markov process 
Z ti Ya=i = 1 thus 

K-l 
i=l 

Calculating the roots of equation 8E[£* (9)\yi, ■ ■ ■ , yr]/&Kj,i = 0, by equation 
PS]) , we have 

Er=p+2 p ( z t = »i^«-i =i|yii---»yr) 



Y^= p +2 p ( z t = h z t-i =j\ui,-~,Vr) 

^ ^ = — Ft p7^ n ; — ( 25 ) 

z2t= P +2 P ( Z t-i =3\Vi,'~,VT) 
In a similar way we obtain equation (I2ip for for j = 1, • • • , K. □ 

2.3 Learning 

A brief summary of HM-MAR(K,P) parameter estimation algorithm is as 
follows: 
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1. For t=l to T do 

Y t = (y t , ■ ■ ■ ,yt- P +i)' 

2. Let 

Y = (Y p ,---,Y T _ 1 ) 

Y = (y p+1 ,--- ,y T )' 

3. For h=l to K do 

A h = (wl' h ,---,wp-' h )' 

4. Let 

p h = P(Z p+1 = %i, ■■■ ,y p ) 

8 = {Aj, aj, pj, iT m n, m,n,j = !,■■■ ,K} 

5. Initialize 6 randomly. 

6. do while none of the parameters of changes 

(a) a p+1 (h) = Ph $( Vp+1 -^ Ah ) 

(b) fr(h) = 1 

(c) For t=l to T do 

. at+1 (h) = E m v«tW^( gt+1 ; t Y:4 ) 
. p T - t (h) = Ef=i^T-t + i(jm yt+1 ~J;- iAj ) 

(d) F(Y p T +1 \Yr) = Y:f =1 a T (j) 

(e) For t=l to T 

. P(Z t = h\Y u ...,Y T )= $Jg$ ) 

. P(Z t = j, =i\Y 1 ,-..,Y T ) = Z^||^!) ^(^XLi^) 

(f) and 

P fc = diag(P(Z p+1 = k\yi...y T ), P{Z T = %i-2/r))- 

(g) set the maximum likelihood estimate as 

* A k = (YPfcY'^YPfcY 

. f. 2 _ Y'P t (I-Y'(YP t Y')- 1 )YYP fe -2Y'P t Y'(YP fc Y')- 1 YP t Y 

* a k~ tr(P k ) 

_ T,I= P +2 p ( z t=i,Zt-i=j\yi,-,yT) 

* w 3,i - tfWD 



• pj 



T,T= P +iP(Zt=3\Yi,",Y T ) 
T-P 
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the convergence of training algorithm is issued by the convergence of all 
expectation maximization algorithms [2]. 

Remark 2.1. If all rows of the transition probability matrix, P of hid- 
den Markov chain {Zf} are estimated to be equal, then {Z{\ are independent 
(i.e. P{z t+ \ = j\z t = i)= P(z t+1 = j)) and 

K 

«m00 = /^2 p i z t+i = i\zt = j)P{zt = j\yi,--- ,yt) 

3=1 

K 

= P{z t+ i = i)^P{zt = j\yx," ■ ,y t ) = P{z t +i = i) 

3=1 
K 

= P(z t+1 =i)^a t {j)=P(zt +1 =i) (26) 

3=1 

which implies that the weighting coefficients of HM-MAR model can be con- 
sidered to be fix after parameter estimation. Thus HM-MAR model will result 
in a MAR model automatically without any further parameter adjustment. 
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